Fast Discovery of Relevant Subgroups using a Reduced Search Space

نویسندگان

  • Henrik Grosskreutz
  • Daniel Paurat
چکیده

We consider a modified version of the local pattern discovery task of subgroup discovery, where subgroups dominated by other subgroups are discarded. The advantage of this modified task, known as relevant subgroup discovery, is that it avoids redundancy in the outcome. Although it was considered in many applications, so far no efficient and exact algorithm for this task has been proposed. One particular problem is that the correctness is not guaranteed if the standard pruning approach is applied. In this paper, we devise a new algorithm based on two ideas: For one, we use the theory of closed sets for labeled data to reduce the candidate space; for another we introduce a special search space traversal which allows the use of optimistic estimate pruning while guaranteeing the correctness of the solution. We show that although our algorithm solves a more valuable task than other (classical) approaches, it outperforms all existing subgroup discovery algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast and Memory-Efficient Discovery of the Top-k Relevant Subgroups in a Reduced Candidate Space

We consider a modified version of the top-k subgroup discovery task, where subgroups dominated by other subgroups are discarded. The advantage of this modified task, known as relevant subgroup discovery, is that it avoids redundancy in the outcome. Although it has been applied in many applications, so far no efficient exact algorithm for this task has been proposed. Most existing solutions do n...

متن کامل

OPTIMAL DESIGN OF STEEL MOMENT FRAME STRUCTURES USING THE GA-BASED REDUCED SEARCH SPACE (GA-RSS) TECHNIQUE

This paper proposes a GA-based reduced search space technique (GA-RSS) for the optimal design of steel moment frames. It tries to reduce the computation time by focusing the search around the boundaries of the constraints, using a ranking-based constraint handling to enhance the efficiency of the algorithm. This attempt to reduce the search space is due to the fact that in most optimization pro...

متن کامل

OPTIMIZATION OF SKELETAL STRUCTURES USING IMPROVED GENETIC ALGORITHM BASED ON PROPOSED SAMPLING SEARCH SPACE IDEA

In this article, by Partitioning of designing space, optimization speed is tried to be increased by GA. To this end, designing space search is done in two steps which are global search and local search. To achieve this goal, according to meshing in FEM, firstly, the list of sections is divided to specific subsets. Then, intermediate member of each subset, as representative of subset, is defined...

متن کامل

Fast Discovery of Relevant Subgroup Patterns

Subgroup discovery is a prominent data mining method for discovering local patterns. Since often a set of very similar, overlapping subgroup patterns is retrieved, efficient methods for extracting a set of relevant subgroups are required. This paper presents a novel algorithm based on a vertical data structure, that not only discovers interesting subgroups quickly, but also integrates efficient...

متن کامل

A Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters

Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010